Implementing quantum gates by optimal control with doubly exponential convergence 
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We introduce a novel algorithm for the task of coherently controlling a quantum mechanical 
system to implement any chosen unitary dynamics. It performs faster than existing state of the art 
methods by one to three orders of magnitude (depending on which one we compare to), particularly 
for quantum information processing purposes. This substantially enhances the ability to both study 
the control capabilities of physical systems within their coherence times, and constrain solutions for 
control tasks to lie within experimentally feasible regions. Natural extensions of the algorithm are 
also discussed. 
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The coherent control of quantum mechanical systems 
P, Q has been sucessfully applied to a growing num- 
ber of tasks in recent years [3|, |4J . The early approaches 
such as two pathway quantum interference, pump-dump 
schemes, or stimulated Raman adiabatic passage are in- 
trinsically understandable in terms of interference due 
to the coordinated activation of resonant transitions be- 
tween few energy levels Q. In order to extend such 
strategies and tackle more challenging problems, the field 
has moved towards employing pulse shapers and opti- 
mization algorithms [6|. 

The field has also broadened its scope, from problems 
of state preparation, or more generally maximizing the 
value of an observable over an ensemble [7j|, to, in partic- 
ular, implementing unitary maps |8j. This latter bridges 
the gap between physical dynamics and the gate formal- 
ism of quantum information processing, for which high 
accuracy solutions are sought, ultimately aiming to reach 
an error correction threshold around 10 -3 to 10~ 4 per 
gate or below [9( . In applying the same methods to both 
state, and map or gate problems, the additional structure 
inherent to gate problems has however been neglected - 
the aim of this letter is to describe how this structure can 
be exploited to better understand and substantially ease 
the solving of gate problems. 

Formally, a closed iV-level system undergoes controlled 
unitary dynamics given by Uf satisfying 



0_ 

dt 



U f (t) = -iH[f(t)]U f {t), 17,(0) = / 



with / the identity matrix, and a time dependent Hamil- 
tonian 



R 



H[f(t)]=H + J2fr(t)H r 



(1) 



where f is a set of R controls pulses, altering the system 
potential within a semi-classical model under the bilin- 
ear approximation. The abstract control problem for a 
target unitary gate V consists in finding a set of real 



valued functions f and an evolution time T such that 
the dynamics satisfies Uf(T) — V. Since this entails 
transferring a full basis of states to another (with rel- 
ative phases), the intuition applicable to state problems 
is no longer available, and in fact schemes to solve it ex- 
plicitly are limited to two level systems, or special cases 
with few levels. For practical purposes, we would only 
require the actual Uf(T) and target V to match up to 
some prescribed error level e, with respect to a notion of 
distance d. Optimisation, whereby a sequence of pulses 
f < - n - ) is generated iteratively with the requisite distance 
d(U f ( n )(T),V) decreasing at each step, has emerged as 
the strategy of choice for achieving this. The definitions 
of distance d(U, V) to measure the error have generally 
been based on the Hilbert- Schmidt norm, using either 
\\U — V\\ or, quotienting out the unphysical global phase 
of the dynamics and normalising, 



d(U, V) 



1 



2VN f 



min\\U-e lv V\\ 



which we will be using herein. In contrast to state con- 
trol problems where intuitive understanding often plays 
a role in choosing the initial trail pulses f ^ , the serious 
limitations of intuitive insight for gate problems lead to 
f^ ' typically being chosen arbitrarily, eg. at random. 
Moreover, to get any measure of gate error based on ex- 
perimental measurement would require exhaustive and 
arduous process tomography, so that there has been an 
overwhelming preference towards working with numerical 
simulation of a model for the system. 

Running a numerical optimisation algorithm requires 
that the control pulses be discretised, and in order to in- 
corporate experimental constraints we can choose a basis 
for discretisation corresponding to the capabilities of our 
pulse shaping equipment. Thus we let 

K 



f r(*) = X! a rkh{t) 



k=l 



and then optimise over the set of RK coefficients a r k', ide- 



2 



ally the basis elements bk would be precisely calibrated to 
the equipment, but for definiteness we will consider rep- 
resentatives of two important cases. A common choice 
in the literature, and the main one we will use, is that 
of piecewise constant functions, as can be produced by 
an arbitrary waveform generator [l(| ■ In the case of fre- 
quency domain pulse shaping, we are dealing with func- 
tions which, up to Gaussian tails, are both time and spec- 
tral bandwidth limited. To capture this property, under 
suitable scaling we can let bk+\ be the Hermite function 
[28j of index fc, and restrict to the first K of these. Such a 
choice has on the other hand not been used in the quan- 
tum control setting to our knowledge, although the ben- 
efits of Hermite functions have certainly been exploited 
in other applications, eg. [HI - while the smoother al- 
ternatives to piecewise constant functions used, such as 
truncated interpolating polynomials fl2| , have much fat- 
ter tails in frequency domain. In addition to the basis 
constraint on the pulses f , there must clearly be some 
bound B on the pulse fluences, equivalently their magni- 
tude in the integrated power norm. 
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FIG. 1: Performance comparison between several runs of the 
Newton-Raphson (red, labeled 'N'), BFGS GRAPE (green, 
labeled 'B') and Krotov (blue, labeled 'K') algorithms with 
small initial pulses f^K Also shown (in black, labeled 'A') 
are Newton-Raphson runs initialised at the norm with least 
ill-conditioning; the cost of finding this norm, on average 21 
seconds, is included. 

For generic intrinsic and control Hamiltonians 
H , ... , Hr of ((T|), as well as many specific cases of physi- 
cal interest, full controllability is known to hold, meaning 
that any target gate can be achieved given sufficient evo- 
lution time T and freedom in shaping the control pulses 
13]. While this is a strong result, it does not identify 
which gates are achievable for particular evolution times 
and constraints on the controls, in any given experimen- 
tal context. Indeed, such specific results are currently 
lacking, and one must resort to numerical investigations 



in order to gain better understanding into the capabili- 
ties of each physical system [3] ■ This motivates the need 
for an optimisation algorithm to solve the gate problem 
in runtimes on a scale rendering the process interactive 
or faster. It is the purpose of this letter to introduce a 
Newton-Raphson root finding approach for this problem, 
which performs substantially faster than existing meth- 
ods (see Fig. [TJ, thereby achieving this goal on non- 
trivial examples. Such an approach also sheds light on 
the strong influence of the pulse initialisation f lead- 
ing to a prescription for how to choose it. 

For consistency, all numerical examples herein are 
for the canonical problem of implementing a quantum 
Fourier transform on a five qubit Ising coupled spin chain 
in a magnetic field gradient using two controls. But these 
are qualitatively representative of the results for differ- 
ent evolution times, target gates, and modes of control 
- a different problem scenario illustrates such similarity 
in the supplement. In terms of Pauli matrices a, the 
Hamiltonian in question is 



H [f (*)] = £ - j> + 2) 

n—1 n—1 

5 5 

+ f 1 (t)J2a^+f 2 (t)J2„^ (2) 

n—1 n—1 

while we fix an evolution time T of 125 and use K = 1000 
basis functions, with piecewise constant controls unless 
stated otherwise. 

Over the last fifteen years, most techniques success- 
fully applied to model-based quantum control problems 
have either come from mainstream gradient-driven opti 



misation theory, eg. conjugate gradient and BFGS 15] 
based GRAPE [16] algorithms, or can be understood in 
this context, as with the Krotov method [13] • These 
state of the art techniques have led to advances such 
as towards implementing logic gates fault-tolerantly Il8[ 
or with minimal errors given the decoherence time [19j. 
They owe their performance to the use of gradient in- 
formation, but a key realisation is that the full Jacobian 
matrix J of Uf(T) for the gate problem can be computed 
as efficiently as its single row constituting the grad ient 
vector. Indeed the usual gradient computation 12 Oj ]. for 
H^MT) — V\\ 2 say, effectively proceeds through J by in- 
ner producting each row of J with V, so that using the 
gradient alone means discarding a lot of valuable infor- 
mation. 

Looking at the singular value decomposition of the Ja- 
cobian matrix leads to a clean geometric picture, whereby 
changes to the pulses below a certain norm r (beyond 
which higher order terms cease being negligible) induce 
changes in the implemented gate within a prescribed el- 
lipsoid. The basic Newton-Raphson iteration [2l| then 
consists in using this Jacobian model, with a heuristic 
choice of r, to compute new pulses bringing the imple- 



3 



merited U (T) closer to the target V, which reduces to a 
linear algebraic task. In order to have the modelling el- 
lipsoid strictly track the unitary group, one can map its 
elements down via the matrix exponential, or conversely, 
group elements up via the matrix logarithm, as we de- 
scribe later. The volume of this ellipsoid determines the 
ability of all algorithms mentioned herein to shift Uf(T) 
in general directions, while for the specific target V a 
more relevant quantity correlated to this volume is the 
distance to exact solution controls upon ignoring higher 
order terms, which we shall refer to as the level of ill- 
conditioning. 
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FIG. 2: For Newton-Raphson runs with initial pulses f^ ' of 
different norms, (b) the wall time needed to reach an error e 
of 1(T 4 and (c) the norm of the corresponding solution pulses, 
with a dashed 'initial equals final' line. In addition, (a) the 
ill-conditioning of the Jacobian at several randomly sampled 
pulses of each norm. 



The situation for our test problem depicted in Fig. [2] is 
representative of the general structure, as described in de- 
tail in the supplemental material, which can be expected 
of all problems. We see that both the ill-conditioning and 
the Newton-Raphson runtime strongly depend on the in- 
tegrated power of the pulses (specifically of f < -°- ) in the 
latter case), but are well concentrated beyond this along 
a single curve, with the minimum of these two curves 
coinciding. When solution pulses are required to have a 
fluence below B, we can cheaply find the minimum of the 
ill-conditioning curve restricted to the interval [0,4B/5] 
say, and use an arbitrary f ^ with this norm to initialise 
the algorithm. It is clear from Fig. [2jc) that this choice 
will yield a solution satisfying the fluence constraint un- 
less no choice can, while the correspondance between ill- 



conditioning and runtime curves makes it the most ef- 
ficient choice. The benefit of this prescription over less 
deliberate ones is evident from comparing the 'A' and 'N' 
series of runs in Fig. [T] 

The original equation Uf(T) = V should be thought of 
as over-determined when the dimension N 2 of the uni- 
tary group is greater than that of the control space RK, 
since then it is only solvable to arbitrarily high accuracy 
for an exceptional set of targets V. This makes it an 
unfavourable case in the context of low error control, be- 
cause it is implausible for a target gate V of interest to 
be special in this sense. On the other hand, whenever 
solvability is not so limited then almost every achievable 
target V admits an RK — N 2 dimensional set of control 
pulses implementing it. Although the number of itera- 
tions for algorithms to reach a given error tolerance e 
is, as expected, reduced as this dimension of degeneracy 
increases, there is a counter-intuitive downside to under- 
determined problems. 

In general when converging to an exact solution, the 
error of Krotov iterates decays exponentially, ie. even- 
tually as 7™ for some 7 < 1, while with conjugate gra- 
dient or BFGS, the error decay is faster than exponen- 
tial, and Newton algorithms have error decaying doubly 
exponentially [22j |. as 0(/3 2 ) for some /3. But in the 
under-determined context, one can only count on expo- 
nential convergence from all of these algorithms except 
for Newton-Raphson root finding which retains its double 
exponential convergence. Indeed, the directions of degen- 
eracy about a solution form a null space to the Hessian 
there, rendering inapplicable the analysis (2§| on which 
faster than exponential convergence results for BFGS are 
based 



24, 



25]. The stark difference between these rates 
is illustrated in Fig. |3l where all Newton-Raphson runs 
surpass I0 -4 in a single iteration once they reach IO -2 
error. 

In order to make best use of Newton-Raphson root 
finding, we must re-formulate our problem over a linear 
space, and the most natural choice here is to seek for the 
functional 

£(f) = log (V^U f (T)) 

to equal zero within the space u of anti-Hermitian matri- 
ces. The resulting algorithm then has the elegant prop- 
erty of reducing the geodesic distance between the actual 
Uf(T) and target gate V on each iteration. It can use- 
fully be made more general by restricting attention to a 
subspace of u, specified by an orthogonal projection P, ie. 
seeking a zero of P£(f). In particular, restricting to the 
space su of traceless anti-Hermitian matrices makes the 
root finding insensitive to the unphysical global phase. 

A less obvious application is to implement a gate on a 
system interacting coherently with an environment [26j j . 
which contrary to Markovian interaction with a bath is 
reversible so need not fundamentally limit the achievable 
error. The full Hilbert space then splits as S <E) E, and 
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FIG. 3: Illustration of the different convergence behaviours 
of Newton-Raphson (red, labeled 'N'), BFGS GRAPE (green, 
labeled 'B') and Krotov (blue, labeled 'K') algorithms with 
initial norms having minimal ill-conditioning. Although sev- 
eral runs were carried out from very different initial pulses, 
the performance profile of each algorithm remains the same. 

for a given gate on the system W our aim would be to 
implement any gate of the form W <8> A for an ancillary 
evolution A - this corresponds to letting V = W ® J in 
£ and projecting it out of the space I ® u with P. Al- 
though the Newton-Raphson algorithm can certainly be 
applied to Lindblad dynamics, choosing a short evolu- 
tion time T to limit dissipation and finding a control for 
the system without bath should still be a first step, since 
computing the evolution super-operator is much more ex- 
pensive. In both extensions, having RK N 2 becomes 
far from sufficient to justify concluding the problem is 
exactly solvable. 



basis to find spectrally narrow solution pulses. Note 
that the same bandwidth as seen in the right panel of 
Fig. @] could be obtained by a suitable basis of low fre- 
quency Fourier components, but implementing such a 
pulse in a finite time duration would lead to distortion, 
thereby deteriorating the achieved error. Interestingly, 
using the same number of basis functions K, the curve 
from Fig. [2jc) and the most efficient initial fluence re- 
main unchanged across both bases - the average number 
of iterations to reach the same error tolerance e is also 
similar (10. 1 vs. 12.5) for this initial fluence. The piece- 
wise constant basis is however special in how operations 
are cheaper with it than in general bases, eg. for our 
test problem computing the propagator Uf (T) takes 0.9 
seconds, as opposed to around 50 seconds in the Hermite 
basis. This makes it all the more important to choose 
the initial f in a general basis carefully, and to this 
end information from the more tractable piecewise con- 
stant case seems to suffice. 

We have seen how viewing unitary map control prob- 
lems from the root finding perspective motivates an al- 
gorithm offering vast performance improvements over ex- 
isting methods, and reveals particularly clean structure 
within the space of controls. This formulation can more- 
over naturally be made in the full generality of pulses 
represented in arbitrary bases and accounting for an en- 
vironment to the system. For state preparation problems, 
considering the Newton-Raphson algorithm analogue 
in that case promises to lead to further fruitful develop- 
ments. 

This work was funded by EPSRC, via 
CASE/CNA/07/47, and Hitachi. The author wishes to 
thank Sophie Schirmer and Peter Pemberton-Ross for 
valuable exchanges. 




FIG. 4: Spectrum of typical minimal norm solutions with er- 
ror below 10 -4 , in the piecewise constant (left) and Hermite 
function (right) bases. These are symmetric about uj — 0, but 
the Hermite functions could just as easily be made bandlim- 
ited about any chosen carrier frequency. 

Up until now, all numerical examples have been in the 
piecewise constant basis, but our algorithm applies to 
general bases, and of particular interest is the Hermite 
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SUPPLEMENTAL MATERIAL 

We will now describe the algorithm as implemented for 
producing numerical results in this work, aiming to in- 
clude sufficient technical detail that it may be fully repli- 
cated based on this account. This specification focuses on 
the resulting performance more than ease of implemen- 
tation, and most of its prescriptions can be legitimately 
simplified as needed. A source code release, to be made 
available as a Python SciKit 17[ under the code-name 
CUMIN, is also in preparation. 



Numerical optimisation 

A common thread in derivative- based optimisation [l[ , 
aiming to minimise an objective function 2: : H M — >• R, 
is the repeated update of a variable a storing what can 
be thought of as the current base point for the domain 
R M . We will write a^°\a^\ ... for the sequence of val- 
ues taken by a, but omit the superscript when referring 
to its value at some present state of the algorithm. On 
each iteration, say the n th , a model for € valid locally 
about the outgoing base point a'™ -1 ' for this iteration 
is constructed based on the value of certain derivatives 



of function € at the point a/" -1 ). The idea is then to 
take for the next base point the minimum v of this 
model for €, possibly restricted to some neighbourhood 
of a(™ _1 ). However, since the model is only valid locally, 
we cannot guarantee <£ (v) < £(a(" -1 ^) ie. that using 
v as the next base point will lead to a reduction of the 
current objective value £ (a). Two families of approaches 
exist for resolving this problem: line search methods look 
for a suitable point along the line starting at a("-^ in the 
direction of v, while trust region methods consider reseat- 
ing the neighbourhood of a'™ -1 ) to which they constrain 
minimisation of the model for £. 

The main performance characteristic which can be re- 
liably affected in choosing between derivative-based op- 
timisation algorithms is the run-time required to achieve 
convergence to a given accuracy. When methods have 
different asymptotic rates of convergence, one of them is 
determined in advance to eventually be closer to the limit 
than all others. Prior to this regime, there is a trade-off 
in the accuracy of models used on each iteration, where 
more derivative information can be acquired at higher 
computational cost to construct better models, which 
should in contrast enable greater progress per iteration. 
In this respect, the Hessian matrix of all M (M + 1) /2 
second derivatives is prohibitively expensive to compute 
except for special (£, so that widely effective algorithms 
only require the gradient vector of M first derivatives. 

For example quasi-Newton optimisation methods, of 
which BFGS is an instance, use the model <B(a + p) ~ 
<£ (a) +g T p+ ^p T Bp where g is the gradient vector of (£ 
at a, and B is constructed iteratively to approximate the 
Hessian matrix of £ at a. At least for BFGS, the model 
minimum v, namely —B~ 1 g since B is always positive 
definite, is used within a line search routine at each iter- 
ation. In this context, steepest descent would correspond 
to replacing B by the identity matrix /, but we do not 
consider this method since it cannot be recommended in 
general, and performs particularly slowly on gate control 
problems Q. For comparison, convergence of BFGS it- 
erates to a local minimum a* where the Hessian of (£ is 
positive definite is understood to happen at a rate, mea- 
sured either by ||a(™) — a*|| or (B(a^) — (B (a*), between 
(£nlogn) H and n(£" 2 ) [| for some problem specific 
£. While convergence for conjugate gradient [B[ and lim- 
ited memory BFGS Q , without being restarted every M 
iterations only goes as £" asymptotically, which is also 
the rate for steepest descent albeit with £ deteriorating 
significantly 0] on poorly conditioned problems. 



Newton-Raphson method 

This description can be adapted to cover Newton- 
Raphson root finding applied to the vector valued func- 
tion £ : R M — > R m by introducing an objective function 
(E (x) = || £ (a;) || 2 and specifying that the model, hence 
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the derivatives used to construct it, should be of £ rather 
than £. The model in this case reads 



£(a + p) 



a; a 



Jp 



where J is the Jacobian matrix of £ at the point a, with 
entry i,j equal to the j th partial derivative g~ of the 

i th component of £ (x) - in the notation of differential 
geometry, we can succinctly write J as d£| a . 

If the model were exact, assuming the rank of J equals 
m, a global minimum of (£ could be found where £ 
is 0, namely at a + p with p being any solution to 
Jp = — £(a). Indeed the classical Newton- Raphson al- 
gorithm, applicable for dimensions M = m, uses the up- 
date rule a( n+1 ) = ay 1 ' +p for which <B(a^) goes to 
zero doubly exponentially in n, specifically as 0(/3 2 ) for 
some P (known as quadratic convergence), for suitable 
initial conditions a(°\ Generalising this to the under- 
determined regime M > m, the value of p is no longer 
explicitly determined as — J _1 £ (a), and a suitable choice 
for p which preserves the quadratic convergence property 
is the minimum norm solution of Jp = — £ (a) |8j, com- 
putable as — J T (J J 1 -)" 1 £ (a) . In either suitable 
initial would be any point sufficiently close to some 
solution a* of £(a*) = where the Jacobian d£| a * has 
full rank m, in such a way that d£| a <„) is always full rank 
so that all iterates are well-defined. 

Finding such an initial point is itself difficult, and for 
general points a the neighbourhood in which the model 
at a is accurate contains neither a root a* of £, nor 
a root a + p of the model. The rationale behind the 
update rule a(" +1 ) = -I- p, that a ( - n+1 ^ should be 
close to a root of £ since £(a(") + p) rj 0, therefore 
no longer applies for general and we are compelled 
to invoke a line search or trust region method to find a 
usable a( n+1 ). But before we delve further into this, let 
us deal more specifically with the function £ which we 
use for the unitary map control problem. Note in passing 
that for a general objective € which is strictly convex 
at some minimiser a*, applying Newton- Raphson to its 
gradient vector g (x) = d€\ x will seek a critical point 
of € so must converge to a* when starting sufficiently 
close to it. The direction of line searches would then be 
— J~ 1 g where J is the Hessian matrix of (£, justifying the 
approximating —B~ 1 g used in BFGS. 



Unitary map problem 

Consider the dynamical Lie algebra [ of our con- 
trol system, ie. the linear space spanned by all iter- 
ated commutator expressions starting with the matrices 
iHq, iH\ 1 . . . , iHr. By the Frobenius theorem, the prop- 
agators Uf (t) must remain within the associated group 
e , consisting of matrix exponentials of matrices in [, for 
all time and over all possible control vectors (f%, . . . , fn). 



Conversely, the controllability theory of bi-linear systems 
shows that for compact e 1 there is a critical time T c de- 
pending only on the system such that every point of e 1 is 
accessible in some time T <T C using some control vector 
f . This result also holds if the class of admissible con- 
trols is restricted to both smooth or piecewise constant 
functions. At least when in addition [ is semi-simple 
the set of unitary matrices Uf (T) which are accessible 
at a fixed time T using some control vector f has non- 
empty interior within e l [ltij ]. for each T > 0, and in fact 
equals e 1 for T sufficiently large. The conditions of e l be- 
ing compact and I semi-simple will be assumed in what 
follows - they hold in particular for the Lie algebra of 
traceless N x TV anti-Hermitian matrices su (N), which is 
of special interest as it corresponds to full controllability 
up to global phase. 

Once we have introduced a desired parametrisation of 
the controls, the discretised propagator U a (T) is a func- 
tion mapping WL RK — > U (TV), taking a vector a composed 
of all parameters a r k to the unitary matrix describing the 
evolution under the corresponding controls. Explicitly, 
U a (T) equals the functional Uf (T) composed with the 
synthesis operator taking a to the vector f of functions 
such that f r (t) = J2k=i a rkbk(t). Solving U a {T) = V 
can naturally be phrased as finding the root of either 
U a (T) - V or V^U a (T) - J, but both of these functions 
range over a non-linear space, making them unsuitable 
for use in the Newton-Raphson algorithm. 

Now let u (N) denote the real linear space of all N x 
N anti-Hermitian matrices with inner product (A, B) = 
Tr (A*B). Amongst maps taking the unitary group to a 
linear space, the inverse log : U (N) —} u (N) of the well- 
studied exponential map is a complex analytic function 
whose value at / is and derivative there is the identity 
on u (N) - for N > 2, it is crucially the only such function 
taking the global phase neglecting subgroup SU (N) to a 
linear space ll(. The most natural way to linearise our 
problem is therefore to look for a root of 

£(a) = Plog(yt[/ a (T)) 

where the branches of the logarithm are chosen to give 
a result in [ + ilRJ of minimal norm and P then ex- 
presses this in an orthonormal basis of I, dropping any 
il component. Note that when [ is su(N), the choice 
of branches for each eigenvalue is the standard one with 
imaginary part in (— 7r,7r], while the implementation of 
P can just keep each strictly upper triangular entry of 
the input matrix and expresses the imaginary part of its 
diagonal in any orthonormal basis containing the vector 
of all 1/yN. This choice for £ also has the advantage, 
when e is compact, of making the corresponding error 
yj € (a) = ||£(a) || equal the geodesic distance between 
U a (T) and V over the group e'/U(l), ie. e' quotiented 



out by global phase (see [12( Sect. 3.2). 

In its strong form Sard's theorem gives that the 
set of target gates V in e'/U (1) for which a solution a* 
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to U a * (T) = V exists with rank deficient Jacobian d£| a » 
is of Hausdorff dimension at most m — 1. Here m is still 
the dimension of the co-domain of £, namely [, which 
makes m — N' 2 — 1 in the most prominent case when I 
is su(N). Suppose the K, with M = RK > m, linearly 
independent basis functions b k are not so degenerate that 
the image of £ is a measure zero set. With respect to 
choosing a target V from the accessible set, of all U a (T) 
for fixed T, the full rank condition implying quadratic 
convergence of the Newton-Raphson algorithm therefore 
occurs with probability one. 

Computing the Jacobian 

The chain rule gives a decomposition for each column 

^(a)=P<ilo g \ vtUa(T) (v^U a (T)) (3) 

of the Jacobian of £, with the derivative of the discretised 
propagator known to be 

■7^—U a (T) = — f U a (T) U a (t) f lH r b k (t) U a (t) dt 

aa rk Jo 

(4) 

Writing W — V^U a {T), given that the composition 
exp o log is the identity on U (N) , the d log | w part of ^ 
is simply (dexp |i og (w)) , which admits a closed form 
expression. Indeed, for a general anti-Hermitian matrix 
A, if A r are the eigenvalues of A and A a correspond- 
ing matrix of eigenvectors (one per column), dexp \a (D) 
can be computed as e A A (T ■ (A* DA)) A*. Here the dot 
denotes elementwise multiplication and T is the matrix 
with entries T rs — 7 (A s — X r ) where 7 is the function 
2 1 y e continuously extended, so that 7(0) = 1 . 
So dlog\ w (B) is A((AtWtBA) /T) A+ where division 
is carried out elementwise and the X r and A are those 
from the eigen-decomposition of log (W). 

For computing the propagator U a (T), we use a fixed 
time stepping scheme which for s = 1, . . . , S successively 
evaluates the two point propagator U a (t s ,t s —i), defined 
as U a (t s ) U a (t s -i) , numerically where to = 0, ts = T, 
and t s — t s _i is the same for each s. In the piecewise con- 
stant control parametrisation case, we let S — K so that 
all the U a (t s ,t s -i) are matrix exponentials - comput- 
ing the eigen-decomposition of the constant Hamiltonian 
over each time interval (t s , £ s _i) then renders the compu- 
tation of both the propagators and their derivatives rel- 
atively inexpensive. In other cases, such as our Hermite 
function parametrisation, any general ODE solver could 
be used, but we found the Magnus-4 method [3] which 
is specialised for linear ODEs to be accurate for a smaller 
number of steps S than in particular the standard Runge- 
Kutta method, also of fourth order. To evaluate the inte- 
gral from (0| , given that we know U a at the endpoints of 
each interval (t a _ 1, t s ) a Lobatto quadrature rule is most 



appropriate, particularly the fourth order rule in order 
to match the accuracy of the propagator U a (t). This 
rule would approximate an integral J t " g (r) dr by the 
weighted sum 

, \ \ g{t s -i) + \ g((t s -i + 1„) /2) + \ g (t s ) 

t s - t s _i [6 3 6 

so that for the whole J Q T g (r) dr each value 
g (t\) , . . . , g (ts-i) enters twice. We also used polyno- 
mial interpolation, based on the value and first derivative 
at t s _i and t s , to evaluate U a at the midpoint of each 
interval (t s -i,t s ), together with precomputed values 
for each bk at the full set of quadrature nodes t s and 
(t,_i+t,)/2. 

Trust-region approach 

As described earlier, as soon as a is sufficiently close 
to a solution we should update a to a + p, where p is 
a solution to Jp = — £(a), on all subsequent iterations 
for which the Jacobian J = d£| a is full rank. However, 
we need an update rule which is effective in general, and 
for this purpose we have found in practise that far from 
a solution the trust region method readily delivers larger 
decreases in square error £ than doing a line search could, 
so for this reason we focus on the former. 

The trust region approach to the Newton-Raphson 
method consists in finding 

p = argmin|| Jx + £ (a) || 2 (5) 

for some trust region radius r, with the situation when 
the unconstrained minimum can be achieved requiring 
that we find the solution to Jx = — £ (a) having small- 
est norm. In the over-determined case m > RK, inde- 
pendently of whether the original root finding problem 
is solvable, the equation Jx = — £ (a) generically admits 
no solution so that minimising the norm of the residual 
Jx + £ (a) is the best we can aim for. In this case one 
would solve the optimisation problem directly as an 
instance 

argmin x J Jx + 2£ (a) Jx 

x : 1 1 x 1 1 ^ r 

of the so called trust region sub-problem, which con- 
sists in minimising a quadratic function x T Ax + 2g T x 
over a ball of radius r. Without loss of generality A 
can be taken symmetric and the ball centred at the ori- 
gin, then for A semi-definite as in our situation a solu- 
tion to the sub-problem can always be found of the form 
x\ = — (A — XI) 1 g for some A ^ 0. To find the appro- 
priate A, we use the higher order analogue of the method 
from [l5( , although for ease of implementation a general 
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convex optimisation package such as CVXOPT could be 
invoked to solve the sub-problem itself. 

Otherwise when m < RK, we can restrict attention to 
x orthogonal to the null space of J by expressing it as 
x = J T y, reducing the problem to 

argmin || JJ T y + £ (a) || 2 = 

argmin y T M 2 y + 2£ (a) T My (6) 

y:y T My^r 2 

where the matrix M — JJ T is defined on the co-domain 
of £. This last problem can be solved through the trust 
region sub-problem instance 

argmin z T Mz + 2£ (a) T VMz 

then using any solution of VMy — z, all of them being 
equivalent in that they yield the same final x. Such a 
reformulation is advantageous since it makes the corre- 
sponding matrix A lower dimensional, and importantly 
for performance, instances of \[M in the algorithm we 
use appear in such a way that it never needs to be com- 
puted. 

In adapting the choice of trust region radius r, we 
strive on each iteration to use the value tq of r for which 
(E(a + p r ) attains its minimum, where p r is the solu- 
tion from ([5]) with radius r. Note that by definition 
any choice of radius r above the norm r* of the least 
square solution to Jx = — £ (a) is equivalent, moreover 
along a + p r the model for l£, namely ||£(a) + Jp r \\ 2 , 
is strictly decreasing as r ranges over [0, r*]. As long as 
the model for £ (a + p r ) is accurate, the true value of (£ 
along a + p r must track its model value and therefore be 
decreasing - but once r is large enough that the model 
for £ starts to break down in the vicinity of a+p r , the in- 
crements p r+ g — p r quickly become meaningless, making 
them overwhelmingly likely to lead the true £(a + p r ) 
to increase. This causes the relative error of the model 
for <£ (a + p r ) — <£ (a) to undergo a swift transition from 
small, in fact vanishing as r — > 0, to large magnitudes as 
r grows past the minimiser tq. In our implementation, 
we adjusted r to make this relative error satisfy 

II £ (a) + Jp r \\ 2 - (£(a) 
\<£(a + p r )-<£(a)\ 

which typically places r close to rn , although the choice 
is a valid one irrespectively (see [l| Sect. 4.0). 

Norm dependent structure 

At the null control vector ie. a = 0, the propaga- 
tor U a it) reduces to the matrix exponential e~ lHot , so 
that working in an eigenbasis of Hq it is easy to see that 



within any U a (t) H r U a (t) expression from ((4]), each di- 
agonal entry will be a constant function. Therefore over 
su(N), only R out of N — 1 possible linear combina- 
tions of diagonal entries can be generated by any inte- 
gral J Q T U a (i) f H r b (t) U a (t) dt where b is a scalar valued 
function. Hence the Jacobian at null controls of U a (t) , 
thus also of £, will be rank deficient for non-trivial sys- 
tems, since it would be unrealistic for a system to have 
R ^ N — 1 controls unless it were of very low dimension 
N. Then the model for £ at a = almost surely admits 
no exact solution, so that the ill-conditioning defined as 
|| J T ( JJ T ) _1 £ (a) || is effectively infinite there, and by 
continuity tends to infinity as ||a|| — > 0. 

At the other extreme, f being large introduces high 
frequency oscillation in Uf (t) and Uf (t) H r Uf (t), which 
cancels out when integrating Uf (t) H r bk(t)Uf (t) for 
any fixed basis elements 6fc. In other words, eigenf unc- 
tions of the infinite dimensional Jacobian dUf (T) have 
a lot of their spectrum in high Fourier components, and 
this is lost when restricting to the lower frequency sub- 
space spanned by the bk to obtain the finite dimensional 
J. As a consequence the singular values of the discretised 
Jacobian J shrink as ||a|| grows, with the corresponding 
ill-conditioning almost surely going to infinity. This ex- 
plains why the ill-conditioning curve from Fig. 2(a) grows 
towards both small and large norms, thereby attaining its 
minimum at some finite norm value. 

For any given target gate V, there is a minimal norm /j, 
below which no solution in the chosen basis can be found 
to the control problem, up to the tolerated error. Due 
to the high dimensionality RK of the discretised control 
space, the volume of parameter vectors a below some 
norm x increases extremely quickly with x, eg. for our 
test problem increasing x by 10% will make the volume 
grow by a factor of over 10 82 . While it is tautological 
that any successful algorithm run must terminate with 
||a|| above fi, there should be no shortage of solutions 
with norms slightly above fj,, so we can expect the final 
norm to be close to [i when starting with ||a^|| < ^t. 
When the norm of any iterate is above /1, it is rea- 
sonable to assume the update a(" +1 ) — has no pre- 
ferred direction, which by the high dimensionality would 
imply it is near orthogonal to a/™) with high probability. 
Since the trust region radius r, hence the update, should 
be noticeably smaller than the current iterate a/"' by a 
factor l/p ^> 1 say, we would conclude that ||a(™ +1 )|| is 
only a factor of 1 + p 2 /2 greater than ||a'™)||. Therefore 
when starting with || a.^ - 1 1 1 > fx, none of the algorithm it- 
erations change the norm of a substantially, making the 
final norm close to the initial norm. These considerations 
account for the characteristic shape of the curve in Fig. 
2(c), which matches the identity function down to some 
floor level, presumably equal to /x, below which it hovers 
just above the floor level. 

Given that the ill-conditioning measures how difficult 



9 



it is to decrease the error on a given iteration, when 
|| a.(°) || > /X and all iterates a have the same norm, runs of 
the algorithm are faster if and only if the ill-conditioning 
is lower for this norm. Otherwise the norm of iterates a 
increases up to some value above fi, but since the ill- 
conditioning curve is increasing below [i, lower initial 
norms lead to runs being slower. The runtime does not 
however blow up as Ha^ll — > because even starting 
from 0, although the first iteration may only reduce the 
error by a negligible amount, the resulting ||a^ 1 -'|| equal 
to the trust region radius will be substantial. Finally, 
this explains the correspondence in Fig. 2 between the 
runtime curve and ill-conditioning curve. 



Test problem 

The system used for numerical illustrations in this let- 
ter was a chain of five qubits with nearest neighbour 
Ising coupling, with a linear gradient inhomogeneity in 
the magnetic field to enable some degree of frequency se- 
lective addressing. Explicitly, the intrinsic Hamiltonian 
is 



r (n+l) _ 



r (") 



with frequencies w„ = n+2, and where a 
matrix acting on the n th spin, eg. 
The control Hamiltonians are 



(n) 



is the Pauli z 

a,® I® I® I. 



Hi 



5 



r («) 



Ho 



n=l 



corresponding to the x-coordinate and y-coordinate com- 
ponents f i and f2 of an electric pulse applied simultane- 
ously to all qubits. For this problem, the total evolution 
time is fixed at T — 125, and the number of basis func- 
tions (per control) is chosen to be K = 1000, with all 
data in the first three figures coming from using piece- 
wise constant controls. By piecewise constant, we for- 
mally mean that bi{t) is vanishing for t outside the in- 
terval (Q,T/K) and constant equal to one inside, with 
each bk equal to the previous bk-i translated forward in 
time by T/K . The Hermite functions refer to the eigen- 
functions of the quantum harmonic oscillator, shifted to 
be centred at T/2, and jointly scaled about T/2 so that 
the maximum any of the first K attain outside (0,T) is 
10~ 8 . Moreover, the definition of integrated power norm 
for the control vector f we use satisfies 



R .1 
|f " 2 = £j( 



|fr (*) I'd* 



which equals the standard Euclidian norm of the param- 
eter vector a when the basis functions &x> . . . , &r- are or- 
thonormal. 
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FIG. 5: Corresponding to Fig. 1 from the main text, here 
for our second test problem. Performance comparison be- 
tween several runs of the Newton-Raphson (red, labeled 'N'), 
BFGS GRAPE (green, labeled 'B') and Krotov (blue, la- 
beled 'K') algorithms with moderately sized initial pulses 
||f (0) || = 20. Also shown (in black, labeled 'A') are Newton- 
Raphson runs preceded by a routine to find the norm with 
least ill-conditioning. 



As a second test problem, we can consider implement- 
ing a logical T-gate encoded with the five physical qubit 
stabilizer code as described in [l6j|- The underlying sys- 
tem is a Heisenberg spin chain of length five, with a fixed 
external coupling field at a Rabi frequency of 10, so that 
H reads 



71=1 



n=l 



with an evolution time T = 90. This has a single control 
corresponding to Hi — enabling the first spin to 
be detuned, through a local voltage which is piecewise 
constant over K = 1500 intervals. 
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FIG. 6: Direct analogue for our second test problem of Fig. 
2 from the main text. For Newton-Raphson runs with initial 
pulses f' ' of different norms, (b) the wall time needed to 
reach an error e of 10" 4 and (c) the norm of the corresponding 
solution pulses, with a dashed 'initial equals final' line. In 
addition, (a) the ill-conditioning of the Jacobian at several 
randomly sampled pulses of each norm. 



